translated by 谷歌翻译
Dimensionality reduction has become an important research topic as demand for interpreting high-dimensional datasets has been increasing rapidly in recent years. There have been many dimensionality reduction methods with good performance in preserving the overall relationship among data points when mapping them to a lower-dimensional space. However, these existing methods fail to incorporate the difference in importance among features. To address this problem, we propose a novel meta-method, DimenFix, which can be operated upon any base dimensionality reduction method that involves a gradient-descent-like process. By allowing users to define the importance of different features, which is considered in dimensionality reduction, DimenFix creates new possibilities to visualize and understand a given dataset. Meanwhile, DimenFix does not increase the time cost or reduce the quality of dimensionality reduction with respect to the base dimensionality reduction used.
translated by 谷歌翻译
Multi-label classification is becoming increasingly ubiquitous, but not much attention has been paid to interpretability. In this paper, we develop a multi-label classifier that can be represented as a concise set of simple "if-then" rules, and thus, it offers better interpretability compared to black-box models. Notably, our method is able to find a small set of relevant patterns that lead to accurate multi-label classification, while existing rule-based classifiers are myopic and wasteful in searching rules,requiring a large number of rules to achieve high accuracy. In particular, we formulate the problem of choosing multi-label rules to maximize a target function, which considers not only discrimination ability with respect to labels, but also diversity. Accounting for diversity helps to avoid redundancy, and thus, to control the number of rules in the solution set. To tackle the said maximization problem we propose a 2-approximation algorithm, which relies on a novel technique to sample high-quality rules. In addition to our theoretical analysis, we provide a thorough experimental evaluation, which indicates that our approach offers a trade-off between predictive performance and interpretability that is unmatched in previous work.
translated by 谷歌翻译
决策树是流行的分类模型,提供了很高的准确性和直观的解释。但是,随着树大小的生长,模型的解释性会恶化。传统的树木诱导算法(例如C4.5和推车)依赖于减少杂质的功能,这些功能可以促进每次分裂的判别能力。因此,尽管这些传统方法在实践中是准确的,但没有理论上保证它们会生产小树。在本文中,我们通过证明简单的增强能够为它们提供复杂性保证的情况,证明使用了普通杂质功能的普通家族,包括熵和Gini Index的流行功能。我们考虑一个通用设置,其中要分类的对象是从任意概率分布中绘制的,分类可以是二进制或多类,并且分裂测试与非均匀成本相关联。作为树木复杂性的衡量标准,我们采用了预期的成本来分类从输入分布中得出的对象,在统一成本的情况下,该对象是预期的测试数量。我们提出了一种树诱导算法,该算法在树复杂性上提供对数近似保证。在温和的假设下,该近似因素紧密到恒定因子。该算法递归选择了一个测试,该测试最大化贪婪的标准定义为三个组件的加权总和。前两个组件鼓励选择分别提高树木平衡和成本效益的测试,而第三个杂质减少组件则鼓励选择更具判别性的测试。如我们的经验评估所示,与原始的启发式方法相比,增强算法在预测准确性和树木复杂性之间取得了良好的平衡。
translated by 谷歌翻译
translated by 谷歌翻译
我们重新审视汉密尔顿随机微分方程(SDES)的理论属性,为贝叶斯后部采样,我们研究了来自数值SDE仿真的两种类型的误差:在数据附带的上下文中,离散化误差和由于噪声渐变估计而导致的错误。我们的主要结果是对迷你批次通过差分操作员分裂镜片影响的新颖分析,修改了先前的文献结果。Hamiltonian SDE的随机分量与梯度噪声分离,我们没有常规假设。这导致识别收敛瓶颈:在考虑迷你批次时,最佳可实现的错误率是$ \ mathcal {o}(\ eta ^ 2)$,带有$ \ eta $是集成器步长。我们的理论结果得到了贝叶斯神经网络各种回归和分类任务的实证研究。
translated by 谷歌翻译